Goto

Collaborating Authors

 Augusta


Unsupervised decoding of encoded reasoning using language model interpretability

Fang, Ching, Marks, Samuel

arXiv.org Artificial Intelligence

As large language models become increasingly capable, there is growing concern that they may develop reasoning processes that are encoded or hidden from human oversight. To investigate whether current interpretability techniques can penetrate such encoded reasoning, we construct a controlled testbed by fine-tuning a reasoning model (DeepSeek-R1-Distill-Llama-70B) to perform chain-of-thought reasoning in ROT-13 encryption while maintaining intelligible English outputs. We evaluate mechanistic interpretability methods--in particular, logit lens analysis--on their ability to decode the model's hidden reasoning process using only internal activations. We show that logit lens can effectively translate encoded reasoning, with accuracy peaking in intermediate-to-late layers. Finally, we develop a fully unsupervised decoding pipeline that combines logit lens with automated paraphrasing, achieving substantial accuracy in reconstructing complete reasoning transcripts from internal model representations. These findings suggest that current mechanistic interpretability techniques may be more robust to simple forms of encoded reasoning than previously understood. Our work provides an initial framework for evaluating interpretability methods against models that reason in non-human-readable formats, contributing to the broader challenge of maintaining oversight over increasingly capable AI systems.


Rise Of The Robot Bees: Tiny Drones Turned Into Artificial Pollinators

NPR Technology

An artist's illustration shows how a remote-controlled drone might one day be used to pollinate flowers. Courtesy of Dr. Eijiro Miyako hide caption An artist's illustration shows how a remote-controlled drone might one day be used to pollinate flowers. Near Esparto, in the beautiful Capay Valley region of central California, 1,400 young almond trees flourish in a century-old orchard overlooking the hills. Since November, they've stood in perfect rows without a hint of foliage -- resting, naked and dormant, for the upcoming growing season. Their branches now swell with bright pastel blooms in preparation for pollination. Like most almond growers, Brian Paddock, owner of Capay Hills Orchard, relies on bees to provide this important aspect of crop development.


Teaching Machines to Learn on Their Own

AITopics Original Links

Steve Mirsky: Welcome to Scientific American's, Science Talk, posted on November 10, 2015. A short episode today for which I'll turn it over now to Scientific American's associate tech editor, Larry Greenemeier. Larry Greenemeier: Computers have always been good at doing things that are really complicated for us humans. On the other hand, computers have a really hard time recognizing a particular voice or face in a crowd; something most kids learn to do before they're even out of diapers. But things are changing fast. Over the next decade or so, machines will more easily mimic inherently human abilities.